Participation de l'IRISA à DeFT2012 : recherche d'information et apprentissage pour la génération de mots-clés (IRISA participation to DeFT2012: information retrieval and machine-learning for keyword generation) [in French]
نویسندگان
چکیده
IRISA participation to DeFT 2012 : information retrieval and machine learning for keyword generation This paper describes the IRISA participation to the DeFT 2012 text-mining challenge. It consisted in the automatic attribution or generation of keywords to scientific journal articles. Two tasks were proposed which led us to test two different strategies. For the first task, a list of keywords was provided. Based on that, our first strategy is to consider that as an Information Retrieval problem in wich the keyword are the queries, which are attributed to the best ranked documents. This approach yielded very good results. For the second task, only the articles were known; for this task, our approach is chiefly based on a term extraction system whose results are reordered by machine learning. MOTS-CLÉS : Génération de mots-clés, Extraction de termes, Recherche d’information, Boosting, arbres de décision, TermoStat.
منابع مشابه
Participation du LINA à DEFT2012 (LINA at DEFT2012) [in French]
LINA at DEFT 2012 This article presents the participation of the TALN group at LINA to the défi fouille de textes (DEFT) 2012. Developed specifically for the second task, our system combines the outputs of three different keyword extraction methods. Our system ranked 2nd out of 9 systems with a f-measure of 21,3%. MOTS-CLÉS : extraction de mots clés, deft 2012, combinaison de méthodes.
متن کاملIndexation libre et contrôlée d'articles scientifiques. Présentation et résultats du défi fouille de textes DEFT2012 (Controlled and free indexing of scientific papers. Presentation and results of the DEFT2012 text-mining challenge) [in French]
Controlled and free indexing of scientific papers Presentation and results of the DEFT2012 text-mining challenge In this paper, we present the 2012 edition of the DEFT text-mining challenge. This edition addresses the automatic, keyword-based indexing of scientific papers through two tracks. The first gives to the participants the terminology of keywords used to index the documents, while the s...
متن کاملAcquisition terminologique pour identifier les mots-clés d'articles scientifiques (Terminological acquisition for identifying keywords of scientific articles) [in French]
Terminological acquisition for identifying keywords of scientific articles The challenge DEFT2012 aims at automatically identifying the keywords chosen by the authors of scientific articles in the Humanities. A keyword list is provided within the track 1. We propose to exploit terminological acquisition approaches. The extracted terms are also sorted and filtered according to their position in ...
متن کاملTraitement d'attributs inter-dépendants pour la recherche d'information par treillis
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt età la diffusion de documents scientifiques de niveau r...
متن کاملUn outil de détection automatique de thèmes
Vu la quantité de documents numériques disponible sur le Web et la nécessité de mettre au point des techniques de recherche efficaces, les systèmes de recherche d'information font de plus en plus appel aux techniques de Traitement Automatique des Langues (TAL) qui exploitent les informations syntaxiques ou sémantiques, dans le but d’améliorer la qualité des résultats fournis par les moteurs de ...
متن کامل